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1 Introduction 


The testing problem in functional data analysis framework is motivated by an example on 
studying executive functions in children with Hemiplegic Cerebral Palsy. The Big/Little Circle 
(BLC) test is an attention measure that tests comprehension, learning and reversal of a rule (see 
e.g. Moore and Puri, 2012). In this study, the data on BLC mean correct latency was collected 
from 141 students, aging from 6 to 13, who completed the BLC test. Among them, 56% are 
action video game players (AVGPs) and 44% are non-action video game players (NAVGPs) as 
shown in Figure 1. Let Ti(t) and Y 2 {t) be the BLC mean correct latency for NAVGPs and 
AVGPs groups respectively, where t is the age of children. They are continuous functional 
variables although observations are collected at discrete points. We are interested in identifying 
ages that the means of Li(t) and Y 2 {t) have significant difference. In particular, we wish to 
detect the specific areas of age where the significant differences occur. We will refer such areas 
as significant areas. Thus we wish to detect the significant areas automatically and at the same 
time minimize the false nondiscovery rate while controlling false discovery rates. 



Age (years) 


Figure 1: The scatterplot of BLC mean correct latency data in AVGPs group (.) and 
NAVGPs group (x). 

Functional data analysis (FDA) has emerged as a popular area of statistics over the last 
decade for the analysis of data with functional features, such as growth curves, motion and 
image data. Ramsay and Silverman (2005) and Ramsay et al. (2009) offered applied-oriented 
introductions to the ideas and tools of FDA. Ferraty and Romain (2011) reviewed some recent 
theoretical developments of FDA. Other important directions related to statistical inference in 
FDA includes Bosq (2000), Yao et al. (2005), Muller (2005), Ferraty and Vieu (2006), Di et al. 
(2009), Horvath and Kokoszka (2012), and Wang and Shi (2014) among many others. However, 
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hypothesis testing with directional error control on detecting areas in which differences of the 
mean curves of two samples are significant (i.e. detecting the significant areas) has received 
little attention. Inspired by the recent development of large-scale multiple testing for complex 
big data (see e.g. Zhang et ah, 2011, Lee and Lee, 2014 and Sun et al. 2015), we propose an 
automatic detection procedure to find significant areas and allow control of the directional error 
at the same time. 

Testing differences in the mean functions of two samples of curves has been approached in 
many literatures. For example, Zhang et al. (2010) introduced an L^-norm based test, Horvath 
et al. (2013) developed a test based on the sample means of the curves, and Staicu et al. (2014) 
proposed a pseudo likelihood ratio test. Extension to multiple samples of curves was discussed 
in Shen and Faraway (2004). Cuevas et al. (2004), Estevez-Perez and Vilar (2008) and Cuesta- 
Albertos and Febrero-Bande (2010) further extended it to the functional analysis of variance. 
Those works all focused on detecting the overall difference. However, we are often interested in 
determining the sub-areas of the functional domain (temporal or spatial) where the mean curves 
are significant different in many problems such as the motivating example we discussed earlier. 
To identify specific areas for a significant difference, Ramsay and Silverman (2005) proposed 
a pointwise t-test without multiplicity control, and Cox and Lee (2008) applied the Westfall- 
Young randomization method to control the family-wise error rate (FWER). However, when the 
number of null hypotheses is large, lack of multiplicity control is too permissive, while the full 
protection resulting from controlling the FWER is too stringent. 

Compared with FWER in the context of multiple testing, the false discovery rate (FDR) 
introduced by Benjamini and Hochberg (1995) has received great attention during the past 
decade. Lots of procedures have been proposed in large-scale scientific studies with goals of 
controlling the FDR. For instance, Benjamini and Hochberg (1995) provided a sequential p-value 
method to control FDR; Sun and Cai (2009) introduced an asymptotical optimal procedure with 
test statistics under dependence; Liu et al. (2012) proposed a graphical-model based multiple 
testing procedure to genome-wide association studies; Lee and Bjprnstad (2013) expressed the 
problem of multiple testing as an inference problem with basic responses. Other relevant works 
are Storey (2002), Efron (2004, 2007), Genovese and Wasserman (2004), Zhang et al. (2011), 
French and Sain (2013) and some of the references therein. When the tests are two-sided as 
in our motivating example, it often becomes essential for researchers to further determine the 
direction of significance, rather than significance alone. Then, the decisions can potentially lead 
to three types of errors for each test: Type I error if the null hypothesis is true but rejected, 
Type H error if the null hypothesis is not true but failed to reject, and Type HI error if the null 
hypothesis is not true but the direction of the alternative is falsely declared. To deal with Type I 
as well as Type HI errors in the FDR framework, Benjamini and Yekutieli (2005) proposed a so- 
called directional Benjamini-Hochberg (BH) procedure for independent tests, Guo et al. (2010) 
extended the directional BH procedure based on the Bonferroni test to gene expression data with 
ordered categories, Clements et al. (2014) introduced a three-stage directional BH procedure to 
study vegetation fluctuations, and Lee and Lee (2014) developed an optimal extended likelihood 
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test with directional FDRs under hidden Markov random field models. However, the multiple 
testing problems mentioned above are all restricted to the assumption that each hypothesis has 
its own observed data, while in our motivating example, we only observed BLC mean current 
latency at finite time points in age range of [6, 13] but we need to make decisions at any age 
(time) between 6 and 13. Recently, Sun et al. (2015) developed an asymptotic optimal data- 
driven procedure that controls the FDR for multiple testing on a continuous domain, where 
the optimality is restricted in a set that test statistic satisfies monotone ratio condition. Their 
method is confined to change detection of one curve that may not be applicable to test differences 
in the two mean curves. And they derived the oracle procedure for two-sided tests by only 
controlling the FDR related to Type I error, which implies that their method may not be 
powerful in multiple tests with more than two actions. 

To address the issue, we propose a new directional FDR procedure for detecting differences in 
the mean functions of two samples of functional data observed at discrete grid points. This would 
be the first attempt to handle two-sample multiple testing for detecting mean differences by 
controlling FDR in functional data analysis framework. In contrast to pointwise testing idea, we 
introduce a nonparametric Gaussian process regression model for directional two-sided multiple 
tests. It provides a natural framework on modeling mean structure and covariance structure of 
the difference between two curves simultaneously and the latter can be used to effectively extract 
information from nearby points for decision making. In the spirit of definitions in discrete cases, 
we define the directional FDRs for the continuous hypothesis testing process, and derive a test 
which optimally controls directional FDRs among all decision rules for multiple testing. Further, 
to make the continuous decision process applicable, a procedure is proposed by approximating 
the optimal test on a continuum. It is shown that it can control directional FDRs at any specified 
level asymptotically. Compared with conventional methods, our simulation studies manifest the 
drastically improved performance of the proposed procedure on directional error control and 
power. 

The rest of the paper is organized as follows. In Section 2, we formulate the multiple testing 
problem and introduce directional FDRs for this continuous hypothesis testing process. Section 3 
derives the optimal test with directional FDRs and presents a procedure for implementation. 
In Section 4, we investigate the finite sample performance of the proposed procedure by simu¬ 
lation studies and an application to the executive function study. The method is extended to 
equivalence tests in Section 5. The paper is concluded with a discussion in Section 6 and all the 
technical details are relegated to Appendix. 

2 Problem formulation and directional FDRs 

In this section, we formulate the multiple testing problem of detecting differences in the mean 
functions of two samples of curves and introduce directional FDRs on a continuum. 

Let Yg{t),g = 1,2 be two curves of functional data, which are functions of t. In functional 


4 


data analysis, t denotes a real-valued variable, which could be time or some other temporal 
or spatial variable. In this paper, without loss of generality, we assume t is time as in our 
motivating example and the corresponding time range is a closed interval T; for simplicity take 
T = [0,1]. We are interested in detecting differences between £(1^1 (t)) and E(Y 2 (t)) over time 
on T. Specifically, consider the following functional regression model 

Yi{t) = ^i{t) + + ei{t), 

Y2{t) = ix{t) + e2{t), ( 1 ) 

where //(•) and /Urf(-) are unknown functions and ei{t) and e 2 {t) are the independent random 
errors. Then, for each time t, we are interested in the directional two-sided test 

Ho{t) : \i^d{t)\ < A 

versus Hi{t) : Hd{t) < -A or H 2 {t) : Hd{t) > A, (2) 

where A is a pre-specified constant, denoting the size of difference we are interested in. Assume 
that there is an underlying state z{t) associated with each time t taking one of three states. We 
set z{t) = 0 if hypothesis at time t is the null and z{t) = 1 or 2 if hypothesis at time t is the 
alternative 1 or 2, respectively. Let 5{t) € {0,1,2} be a decision rule for the hypothesis HQ{t). 
If 6{t) = z{t), the hypothesis is correctly identified by the decision rule, otherwise there exist 
errors. Let Rk = {t € T : 5{t) = k} and Vjk = {t € T : z{t) = j,5{t) = k} for j,k = 0,1,2. 
Table 1 summarizes the possible outcomes of multiple testing with two alternatives, which shows 
that there exist three types of errors in the directional two-sided multiple testing (2). 


Table 1: Outcomes of multiple testing with two alternatives 



Declared as null 

5{t) = 0 

Declared as alternative 1 

m = 1 

Declared as alternative 2 

6{t) = 2 

Total 

Null (z(t) = 0) 


Vbo 

Vbi {Type I error) 

V 02 (Type I error) 

To 

Alternative 1 {z{t) 

= 1) 

kio {Type II error) 

Til 

V 12 {Type III error) 

Ti 

Alternative 2 {z{t) 

= 2) 

Y 20 (Type II error) 

V21 {Type III error) 

V 22 

T 2 

Total 

i?o 

Ri 

i?2 

T 


Let C{-) be the Lebesgue measure on time range T. Then, C{Ni) = T(Vbi) -|- C{Vq 2 ) and 
C{N 2 ) = C{Viq) + C{V 2 o) are the sizes of areas corresponding to Type I and Type II errors, 
respectively, and C{N^) = C{Vi 2 ) -f T(V 2 i) is the size of area corresponding to Type III error, 
a directional error. When the interest is to test hypotheses at individual time points, a natural 
and practical way is to control an error rate in the FDR framework by considering all of these 
three types of errors. Thus, in this paper, we propose to control either the sum of Type I and 
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Type III errors while minimizing the Type II error or control the Type I error while minimizing 
the sum of Type II and Type III errors. Let a V 6 = max{a, b}. Define FDR and the marginal 
FDR (mFDR) for Type I error rate as 


FDRi = E 


£(iVi) 


£(Ri UR 2 ) V 1 

those for Type III error rate as 

FDRiii = E I - 

\ C{Ri U R 2 ) V 1 

and those for the sum of the Type I and Type 


FDRi+iii = E 


f £(iViUA^3) 

\£(Ri UR 2 ) V 1 


and mEDRi 


E{/:(iVi)} 

E{C{RiUR2)}' 


and mEDRin 


E{£(jV3)} 
E{/:(Ri UR2)}’ 


III error rates as 


and mEDRi 4 .n 1 


E{C{N,UNs)} 

E{C{R^UR2)}' 


Besides the error rate for discoveries, we can define similar error rate for false nondiscoveries, 
the false nondiscovery rate and the marginal false nondiscovery rate 


FNDR = E 


J^{N2) ] 
C{Ro) V 1 / 


and inENDR 


E{/:(iV2)} 

e{/:(Ro)}’ 


which is related to Type II error. Eurther, to compute the power of a single directional two-sided 
testing procedure, Leventhal and Huynh (1996) recommended excluding Type III error from the 
conventional power. Therefore, in this paper, we define a modified power (MP) of a directional 
two-sided multiple testing procedure by considering both Type II and Type III errors 

MP = 1 - 

E{£(TiUr2)) 


Remark 1. Lee and Lee (2014) created a similar table to summarize the outcomes of multiple 
testing with two alternatives and defined the eorresponding directional FDRs. The key difference 
here is that for continuous testing process ( 2 ), the false discovery measures are related to the sizes 
of areas corresponding to three types of errors, which couldn’t be calculated directly by counting the 
number of cases as in discrete case where each hypothesis has its own observed data. Therefore, 
a new strategy is needed to develop for inference based on the continuous functional data analysis 
framework hut using the data observed at discrete points. 


3 Optimal tests for automatic detection of significant 
areas 

3.1 Optimal procedures for controlling directional FDRs 

Suppose the observed data {{Yii,tu) : i = 1,... ,ni} and {{Y 2 i,t 2 i) : i = 1,... , 712 } are realiza¬ 
tions of two underlying stochastic processes from model (1). The notation of the time points. 
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tii and t 2 i, allows for different observation points in the two groups, and tu's and t 2 i’s consist 
of subsets of T. Our objective is to predict the states of hypothesis z{t) G {0,1, 2} at any time 
point t G T in an optimal way. Therefore, it is necessary to exploit the temporal correlations 
and extract information from nearby points for prediction. Consider a loss function 

L{d, z; A) = AiT(A^i) + As^CiVa) + (3) 

where Ai,A 2 and A 3 are relative costs. The following theorem derives the optimal rule for the 
weighted classification problem (3). 

Theorem 1. Let V be the whole data set consisting of {{Yii,tii) : i = 1,... ,ni} and {(T 2 iT 2 i) : 
i = 1,... , 712 }. Assume all parameters in model (1) are known. Then, 

(1) 7 /A 2 = 1 and Ai = A 3 = A m (3), the optimal decision rule : t G 

T} = axgm\n^E{L{5,z-,X)\T>} becomes 

5 (/+m)(^) ^ 2 i/ j ~ ^ ° j > a «nd P(z(t) = 2\V)> P{z{t) = 1\V), 

= 1 if ^ = 2 I P) < P(2(t) = 1 I P), 

= 0 otherwise] 

(2) if Xi = X and A 2 = A 3 = 1 in (3), the optimal decision rule (5^^^ = {6^^\t) : t G T} = 
argmin 5 £^{L( 5 , z] A) 12?} becomes 

V) > P{z{t) = 1 I 2?), 

2?) < P(z(t) = 1 I 2?), 

= 0 otherwise. 

Theorem 1 gives the optimal rules for various weighted classification problems. We next show 
that the optimality property can be extended to the multiple testing problems with respect to 
various directional FDRs defined in Section 2. 

Theorem 2 . Let A = : A > 0} 6 e the collection of decision rules in form of 

derived in Theorem 1. Given an mFiARi+ni level a, let 6 = {6{t) : t £ T} be any decision rule 
satisfying mFDRi+iii{6} < a. Then, there exists a X determined by 5 such that G A 

performs better than 6 in the sense that 

mFDRi+iii{(5(^+^^^)} < mFDRi+iii{(5} < a. 


6^^\t) = 2 if 

= 1 if 


P{z{t) = 2 I 2?) 
P{z{t) = 0 I 2?) 

Pirn = 1 I P) 

P(z(t) = 0 I 2?) 


> A and P(z(t) = 2 

> A and P(5;(t) = 2 


and 

mFNDR{<5(^+^^^)} < mFNDR{5}. 
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Theorem 2 demonstrates that the optimal decision rule for controlling the sum of Type I and 
Type III errors with the smallest Type II error belongs to the set A. In other words, one only 
needs to search in A for the optimal rule, instead of searching for all decision rules. Similarly, it 
can be shown that the optimal decision rule for controlling Type I error with the smallest sum 
of Type II and Type III errors is in the form of derived in Theorem 1. 

Remark 2. In the case of = 0, Sun et al. (2015) showed the optimal solution to the weighted 
classification problem is optimal in {5 : 5{t) = I{T{t) < c},T statisfies monotone ratio condition} 
for the multiple testing problem, but Theorem 2 extends the result to a more general case, re¬ 
vealing that this solution is even optimal among all decision rules for the multiple testing. 

3.2 Extension to practical situations 

It is not straightforward to use the optimal procedures described in Section 3.1 because (a) 
it is impossible to make an uncountable number of decisions on T, and (b) the true smooth 
trajectories pfi) and Pd{') are not directly observable and thus the test statistics should be 
evaluated at unobserved time points. In this section, we develop procedures for directional 
FDRs control to overcome these difficulties. 

To address (a), we first divide the interval T = [0,1] into N equal-length subintervals [sj-i, Sj) 
with So = 0 and Sj = Sj_i -|- 1/A^, i = 1,... , N — 1, and pick the center point t* in [sj_i, sf), 
i = 1,... ,N — 1. Then, for a decision rule 5, we have 

A 1 ^ 

/ E{/(5(t) / 0)P{zit) = 0\V)}dC{t) = hm -Y,^{So{t*)mt*) + 0)}, 
1 ^ 

= 1)+ simm) = 2 )}, and 

i=l 

^ N 2 

i=l k=lj^k 
N 2 

E{/(<5(t*) = k){l- SAt*))}, 

i=\ k=l 

where function Sk{t) = P(.2(t) = k\T>), k = 0,1,2. Therefore, motivated by the limit definition 
of a definite integral, mFDRi can be estimated by 

_ 1 N 

mFDRi(A) = - So{t*)im) 0) (4) 

i=l 

for any given A and all parameters in model (1), where r = I{d{t}) 0). According to 

Theorem 1, it is easy to see that d^^\t) = 0 if So{t) < (1 -|- A)“^. Suppose that Ai and A 2 are 
chosen so that < (1 -|- A 2 )~^ < < (1 -|- Ai)“^ < /i(r+ 2 )) where h(^r) is the rth smallest 


E{C{NA} = 
E{C{N,)} = 
E{C{NiUNA} = 


value of 5o(f*). Then, 


r r+1 

mFDRi(A 2 ) - mFDRi(Ai) = ^ h(j) - (r + 1)"^ ^/i(i) 

i=l i=\ 

r 

= {r(r + -r/i(^+i)} < 0. 

i=l 


Thus, mFDRi monotonically decreases with A, and we propose the following step-down test 
procedure for FDRi control: 


let A* 

with 


inf {A : mFDRi(A) < a}; then 

N 

<t < Si)S^^^t*) 

i=l 

2 if > A* and S 2 {t*) > 

1 if > A* and S 2 {t*) < 

0 otherwise. 


(5) 


The following theorem shows that this test controls FDRi at level a asymptotically, which 
implies that the proposed procedure (5) approximates a multiple comparison correction for a 
continuous comparison process ( 2 ) as the grid for pointwise comparisons becomes finer. 

Theorem 3. Let {U^;^[si_i, Sj) : N = 1,2,...} be a sequence of partitions of T satisfying 
Conditions Cl and C2 in the Appendix. Then, the FDRi level of procedure (5) satisfies FDRi < 
a -|- o(l) when N —>■ oo. 

Remark 3. For simplicity, we choose the center point t* in each subinterval as a 

representative point. But from the proof of Theorem 3, we can see that, no matter which point 
is chosen as a representative point in the proposed procedure (5) controls FDRi o,t the 

nominal level asymptotically as long as Conditions Cl and C2 are fulfilled. 

Similarly, by using mFDRi+iii(A) = = fc)(l “ 5'fc(ti)), we control 

FDRi+iii at the nominal level. However, they are still difficult to implement because of (b). 

Further to address (b), we propose a Gaussian process regression (GPR) model for (1) to 
estimate unknown quantities Sk{t) = P(z(t) = k\V), k = 0, 1 , 2 . GPR model is a good choice as 
a globally approximated nonlinear functional regression model in (1) (in contrast with locally 
approximated model for most of conventional nonparametric model); see the details in Shi and 
Choi (2011) and Wang and Shi (2014). Specifically, consider {/r(t) : t gT} and {p-dit) ■ t gT} as 
independent random processes and suppose they have Gaussian process priors with zero means 
and kernel functions k(-,-;? 7 ) and 6), respectively, where Cov(/r(t),/i(t')) = K{t,t']r]) and 
Cov (udit), Td{t')) = 'y{t,t']6). Assume that {ei(t) : t G T} and {e 2 {t) : t G T} are Gaussian 
white noise processes with zero mean and variance which are independent from each other 
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and to both {n{t) : t £ T} and {Hd{t) ■ t £ T}. One example of the kernel function 7 (-, •; 6) is 
the following squared exponential covariance function with a nonstationary linear term: 

- tj)^/ 2 } + CUtj, ( 6 ) 

where 9 = (^, cj, () is a set of hyper-parameters. When C = 0, the kernel function 7 (-, •; 9) reduces 
to so-called squared exponential covariance function, which is stationary and nondegenerate 
(Rasmussen and Williams, 2006). The parameter uj corresponds to the smoothing parameters 
in spline. So, we call uj~^ the length-scale. A large length-scale implies the underlying curve is 
expected to be essentially flat and the decrease in length-scale results in more rapidly fluctuating 
functions. 

Let n = m + n 2 , 0 = {r]'^, 9^, , and = {Y'^, Y JY with i = (Tfi,..., Yi^Y 

Y 2 = (L 21 ,..., Y 2 n 2 Y■ Consider the joint density function of Y, p, and pd 

m ^2 

f&iY,p,pd) = </>(A I 0,Kn)YP'd I 0,r„J I Ytii) + tJ-d{tii),cr‘^)'\\4>iY2i I 

i=l i=l 

(7) 

where /i = (/i(tii),..., ^(^ 21 ), • • •,//(t 2 n 2 ))^, P-d = (/Ud(t 2 i), • • •, ^^^(^ 2 ^ 2 ))^, </>(•) is the 

density of (multivariate) normal distribution, Kn and are covariance matrices of p and pd, 
respectively, with (i,j)th element K{ti,tj]r]) and 'y{ti,tj;9). Then, the parameters 0 can be 
consistently estimated by maximizing the likelihood (see Shi and Choi, 2011) 

l{e-,Y) = fQ{Y) = 11 fQ{Y,p,Pd)dpdPd. 

/V /s (7^ 

Let 0 = (? 7 ^, 9 , a'^Y be the estimates of 0. Then, we can make inference about fdd{t) by using 
f^YdY) I 7?). As the sample size n goes to infinity, we have f^ilddit) \ V) —)• f^indit) I 7?). 
Now, we consider how to make inference about = {^d{ti), ■ ■ ■, ldd{t*YY j where T* = 
is a collection of the center points based on partition T = Sj). It is not 

difficult to prove (see the details in Appendix B) that the conditional distribution of given 
the data set P is a multivariate normal distribution with mean and covariance given by 

P = E{^X^ I V) = + {Ini - YuYnlYYilnl " Yu)Y^ - Y^2Y2}, (8) 

A = Cov(/i^ \V) = Tn- ^{T*)TY^'^{T*) + a'^^{T*)nY^^i'I'*)^ 

where T'(T*) is the Nxrii covariance matrix between fx^ and pd with (i, j)th element 7 ( 1 *, 0), 

Tjv is the covariance matrix of fx^ with {i,j)tli element 'y{t*,tj;9), X) is a n x n block matrix 
given by 

E = Kn{Kn + a^In)-^ = ( \ 

\ Y 21 2j 22 y 

with In being a n x n identity matrix, and Lini = cr^r„i -|- Tni{Ini — ^ii)rni. Therefore, 
to calculate mFDRi define in (4), we draw M samples {p^ ■ ^ = 1,...,M} from the con¬ 
ditional distribution of fx^ = {fXd{tl), ■ ■ ■, fXd{t*YY^ where p^ = {p^i^ ■ ■ ■ ^be mth 
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A^-dimensional sample predicting the values at time points ... Then, we can approximate 
mFDRi by replacing So{t*) by its estimate So{t*). More specifically, note that 


So{t*) = P{z{t*) = 0\V)=E[I{\f^,{t*)\<A}\V] 
= j < A}(/)(/x^ I fi,A)dfi^. 


Thus, So{t*) can be estimated by 

1 ^ 

= M E S A}- 

m=l 

Similarly, to implement procedure (5), we compute Si{t*) and 52(t*) by 

1 ^ 

m=l 
1 ^ 

and S 2 {t*) = 

m=l 


respectively. 


Remark 4. The joint density function defined in (7) is the h-likelihood (Lee and Nelder, 1996) 
when we treat fi and fid o-s unobservable random variables. It eontains all the information in 
the data for parameters 0 and unobservable random variables fi and fid (Bj0rnstad, 1996). The 
method diseussed above ean also be extended to a fully Bayesian way by assuming a hyper-prior 
distribution for 0; see Shi and Choi (2011). 

Remark 5. Sun et al. (2015) used the similar approximation strategy to mimic the optimal 
proeedure as in (5) . But for implementation, they applied a Bayesian computational algorithm 
and drew MCMC samples during the iterations to estimate So{t*), which can be rather com¬ 
putationally intensive when the number of representative points N is large. While for the pro¬ 
posed procedure, we can get the estimates of unknown parameters efficiently by using the nice 
proprieties of GPR models and estimate Sk{t*),k = 0,1,2 directly by generating the samples 
from multivariate normal distribution with mean and covariance given in (8). GPR models can 
cope with multiple covariates and thus the proposed method can be easily extend to problems in 
multivariate funetional domain for example in 3-dimensional spatial domain or f-dimensional 
temporal/spatial domain dynamieal fMRI images. 


4 Numerical study 

4.1 Simulation studies 

In this subsection, we conduct a set of simulation studies to assess the hnite sample performance 
of the proposed method. The purpose is twofold. First, we compare our method with directional 
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Benjamini-Hochberg procedure (Benjamini and Hochberg, 1995) and directional Benjamini- 
Yekutieli procedure (Benjamini and Yekutieli, 2001, 2005). Since both of them only work for 
discrete case where each hypothesis has its own observed data, in Example 1, we assume two 
curves observed at the same set of time points and restrict the analysis for testing hypotheses 
at this set to permit comparisons, which means we have ni = n 2 = N. Second, we evaluate the 
performance of our method in Example 2 to test hypotheses on a continuum T with two curves 
observed at different discrete grid points. 

Example 1 We generate 200 datasets from model (1), where both //(•) and Hd{-) are 
Gaussian processes with zero means and covariance functions K{ti, = 3exp{—(tj — tj)^} and 

= lOexpj—— tj)^/2|, respectively, implying two stationery processes, and the 
error processes ei(-) and e 2 (-) are white noise processes with zero mean and finite variance cr^ = 1. 
For each simulated dataset, data are generated at = 500 time points ti ~ Uniform([6,13]). 
For all simulations, we choose A = 0.80 so that the expected proportion of time points with 
< A is 20% and set the nominal level as a = 0.10. To study the effects of correlation, we 
vary u resulting in the curve Hd{-) from smooth to fluctuating. 

Figure 2 plots FDRi+ni and FDRi as functions of u at the nominal level 0.10 and Figure 3 
shows the averages of FNDR and MR over the 200 datasets. We can see that the proposed 
method control FDRj and FDRi+m reasonably well. When uj becomes larger, there is a in¬ 
creasing chance to detect ^dit) to be non-null and declare it to be less than —A or larger than 
A. That is, as the correlation of the signals decaying, it is more possible to make directional 
errors. As expected. Figure 3(a) shows that the proposed method may have relatively large 
FNDR when oj is quite large. Correspondingly, Figure 3(b) implies that it may encounter loss 
of power. Though the directional Benjamini-Yekutieli procedure accounts for dependence, it 
is the most conservative and therefore the least powerful. The directional Benjamini-Hochberg 
procedure, derived under the independence assumption, controls the FDRi conservatively as the 
original Benjamini and Hochberg’s procedure (Benjamini and Hochberg, 1995), which controls 
the FDRi at a level smaller than the desired a. 

Example 2 In this example, the true model is the same as in Example 1, except that 
the process /i(-) is a Gaussian process with zero mean but a nonstationary covariance function 
= 3ex.p{—{ti—tj)'^}+3titj. The sampling design for two curves is balanced (ni = n 2 = 
200), but irregular, and furthermore different across the two samples. Specifically, we assume 
that {tii : i = l,...,ni} and {t 2 i ■ i = l,...,n 2 } are iid realizations from Uniform([6,13]) 
with 20% overlapping. Predictions are made and tests of (2) are conducted at center points of 
N = 500 equal-length subintervals covering the time range [6,13]. For all simulations, we set 
A = 0.80, a = 0.10 and vary the value of w as in Example 1 and repeat our procedure 200 times 
for each configuration. 

Figure 4 depicts the distribution of FDRi+ni and FDRi and Figure 5 presents the distribution 
of FNDR and MP over 200 replications. We can see that the proposed method maintains 
FDRi_|_iii and FDRi properly no matter the curve Udi') is smooth or wiggly. It is in accordance 
with Theorem 3 in Section 3.2. And it implies that the proposed procedure is robust under 
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(a) (b) 



Figure 2: Comparison of directional Benjamini-Hochberg procedure (+), directional 
Benjamini-Yekutieli procedure (o) and the proposed method (o): (a) FDRi+m versus 
cj; (b) FDRi versus oo. 


(a) 


(b) 




Figure 3: Comparison of directional Benjamini-Hochberg procedure (-I-), directional 
Benjamini-Yekutieli procedure (o) and the proposed method (o): (a) FNDR versus u 
under FDRi+ni at 0.10; (b) MP versus u under FDRi at 0.10. 
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different magnitudes of dependence across the values of /rrf(-). Moreover, the boxplots of FNDR 
and MP show that the proposed procedure is powerful, where the MP is 0.95 even when cj is 
very large. 

To investigate the consistency of the estimated directional errors, the values of the estimated 
E{/:(iVi)}, E{/:(fV 2 )}, E{C{N 3 )}, E{/:(Ro)}, E{/:(i?i)} and E{/:{R 2 )} are averaged so that 
the directional errors are calculated and regarded as the true values. Table 2 compares them 
with the estimated directional errors when a; = 80. We observe that the proposed procedure 
gives consistent estimators. Slight underestimation of mFDR explains slightly liberal control of 
directional FDRs when uj is large. 


(a) 



CO 


(b) 



CO 


Figure 4: The boxplots of FDRi+m and FDRi based on 200 replications, respectively. 
The boxplots’ horizontal lines are the 0.05, 0.25, 0.50, 0.75 and 0.95 quantiles of FDRi+m 
and FDRi versus uj, and the numbers of above the boxplots are the means of FDRi+ni 
and FDRi. 


4.2 Real data analysis 

To illustrate the proposed method, we analyze BLC mean correct latency for action video game 
players (AVGPs) and non action video game players (NAVGPs). The data consists of 84 girls 
and 57 boys from primary and secondary schools, aging from 6 to 13 years old. They were 
recruited to answer the video game playing questionnaire. Using data from the questionnaire, 
which were collected separately from children and from their parents for verification, these 141 
students were subdivided into two groups: the AVGPs group (56%) and the NAVGPs group 
(44%). Then, they were required to finish the Big/Little Gircle test via an action video game, 
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(a) 


(b) 




Figure 5: The boxplots of FNDR and MP based on 200 replications, respectively. The 
boxplots’ horizontal lines are the 0.05, 0.25, 0.50, 0.75 and 0.95 quantiles of FNDR and 
MP versus ca, and the numbers of above the boxplots are the means of FNDR and MP. 

Table 2: True errors and averages (standard deviation) of estimated errors when cu = 80 


When controlling FDRi at 0.1 


Errors 

mFDRi 

mFDRiii 

mFDRi+iii 

mFNDR 

True 

0.099 

0.003 

0.102 

0.260 

Estimated 

0.098 

0.003 

0.101 

0.259 


(0.002) 

(0.002) 

(0.002) 

(0.040) 

When controlling FDRi+ni at 0.1 

Errors 

mFDRi 

mFDRiii 

mFDRi+iii 

mFNDR 

True 

0.096 

0.003 

0.099 

0.256 

Estimated 

0.095 

0.003 

0.098 

0.254 


(0.002) 

(0.002) 

(0.000) 

(0.040) 


which was defined as a video game genre that emphasizes hand-eye coordination and reaction¬ 
time. Our objective is to detect the areas of age that the significant differences between AVGPs 
group and NAVGPs group occur. 

We use the Gaussian process regression model (1) to fit the data for each group. The 
estimated mean curves corresponding to AVGPs group and NAVGPs group are given in Figure 6. 
We can see that there are some crossings between these two curves. We first consider a test with 
from (2) and A = 20, chosen by our collaborators in neuroscience. 

We generate samples based on the conditional distribution of /x^ on center points of 500 
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equal-length subintervals covering the age range [6,13], and test the hypotheses at each time 
point. Figure 6 shows the significant and non-significant areas detected by the proposed proce¬ 
dure, when controlling FDRi+m at level 0.10. Aging from 6 to 9, the NAVGPs have significantly 
higher BLC mean correct latency than AVGPs, while after 9 years old, they have non-significant 
differences. It implies that the video game-based therapy may have significant effect on children 
with hemiplegia aging from 6 to 9 years old, while it may have limited help with of some of 
symptoms when they are more than 9 years old. The proposed procedure reports mFDRj = 0.08 
and mFDRin = 0.02, indicating the Type III errors account for about 20% of mFDRi+ni- It 
reports mFNDR = 0.22, implying that the means of BLG mean correct latency of NAVGPs and 
AVGPs groups could have differences larger than A = 20 in 22% of areas of age after 9 years 
old. 



Age (years) 

Figure 6: The significant and non-significant areas detected by the proposed procedure 
under mFDRi+in control at 0.10. Aging from 6 to 9, the NAVGPs have significantly higher 
BLC mean correct latency than AVGPs {S{t) = 2), while after 9 years old, they have non¬ 
significant differences {6{t) = 0). The solid and the dash lines represent the estimated 
mean curves for the NAVGPs group (x) and the AVGPs group (.), respectively. The 
estimates of errors are: mFDRi = 0.08, mFDRin = 0.02, and mFNDR = 0.22. 

Using different values of A in (2) makes the method very flexible. Figure 7 presents the 
results with A = 1 and A = 100. The estimated mFDRi, mFDRni and mFNDR are also 
calculated and presented. The former indicates there are two significant areas: one from age 6 
to 9.2 and the other from 9.6 to 10.6. Consequently, with a smaller A mFNDR increases, i.e. 
there could exist 44% of areas, among declared non-significant areas, that the mean difference of 
these two groups is larger than A = 1. The results for A = 100 imply that there is no detected 
significant area, while there would exist 30% of areas that the mean difference of two groups is 
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larger than A = 100 in whole age range [6,13]. It is not snrprising that there is no rejection 
at all when A > 100. This rather large nnmber makes the result meaningless. In general, the 
choice of A depends on a scientihc question of interest. 


Deita=i 


Delta=100 



Age (years) 



6 7 8 9 10 11 12 13 

Age (years) 


Figure 7: The significant and non-significant areas detected by the proposed procedure 
under mFDRi+m control at 0.10. Left (A = 1): aging from 6 to 9.2 and from 9.6 to 10.6, 
the NAVGPs have significantly higher BLC mean correct latency than AVGPs {6{t) = 2), 
while they have non-significant differences at other ages {S{t) = 0). The estimates of 
errors are: mFDRi = 0.01, mFDRni = 0.09, and mFNDR = 0.44; right (A = 100): there 
is no rejection which implies that no significant area is detected. The estimates of errors 
are: mFDRi = 0, mFDRni = 0, and mFNDR = 0.30. 


5 Equivalence tests 

A statistical hypothesis test is a decision rule to check whether the null hypothesis is justifiable 
given the observed data. We could reject the null hypothesis when there is strong evidence that 
it is wrong, but we could never prove it. Therefore, failure to reject 77o(t) in (2) does not mean 
that the difference between mean functions of two curves Yi{t) and Y 2 {t) is no more than A at 
time t. To demonstrate similarity rather than showing differences, we sometimes need to put 
the similarity hypothesis into the alternative. We might thus consider the multiple testing 

: Pd(t) < -A^ or Ro2(i) : > A^ 

versus Hi{t) : |pd(t)| < A^, (9) 

where A^ is called equivalence margin that is typically chosen as a limit below which differences 
are practically meaningful, and we call the test (9) the equivalence testing for functional data. 

Equivalence tests have gained increasing attention during the past two decades. The goal of 
an equivalence test is to establish practical equivalence, which is popular used in application areas 
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such as medicine and biology. There are lots of procedures that have been proposed to conduct 
equivalence tests for scalar data. For example, Schuirmann (1987) proposed the two one-sided 
tests procedure for bioequivalence; Anderson and Hauck (1990) suggested the comparison of 
both mean and variance of the two responses when assess a generic drug’s performance relative 
to a brand name drug; Brown et al. (1997) developed an unbiased test for the bioequivalence 
problem; Wang et al. (1999) discussed ways to construct a test simultaneously for all the 
individual pharmacokinetic parameters; Romano (2005) proposed a optimal test for testing the 
mean of a multivariate normal mean. Other relevant works include Chow and Liu (1992), Berger 
and Hsu (1996), Meyners (2012) and some of the references therein. However, in some cases 
the question of practical equivalence cannot be reduced to a hypothesis regarding scalar data. 
Recently, Fogarty and Small (2014) extended the equivalence testing framework to the functional 
regime. They considered an equivalence testing for overall mean difference. But they cannot 
test areas of the function domain with location parity. Therefore, it will be interesting to extend 
the proposed idea to equivalence testing (9). 

Let z^{t) be the underlying state at time t. We set z^{t) = 1 or 2 if hypothesis at time t 
is the null 1 or 2 and z^{t) = 3 if hypothesis at time t is the alternative. Let 6^{t) G {1,2,3} 
be a decision rule for the hypothesis (9). Let = {t G T : 6^ (t) = k} and = {i g 
T : z^{t) = = A:} for j,k = 1,2,3. Similar to directional two-sided test (2), there also 

exist three types of errors in equivalence testing (9). Table 3 sums up the possible outcomes of 
multiple testing with two nulls. Then, T(A'}^) = C{V^) + C{N^) = C.{V^) + C,{V^) and 

C{Nf) = C{V^) + C-{V^) are the sizes of areas corresponding to Type I, Type H and Type HI 
errors, respectively, where £(•) is the Lebesgue measure on T. Hence, we define the marginal 
false discovery rate as mFDR^ = E{T(A^{')}/E{T(R3)}, the marginal false nondiscoveary rate 
for Type H error as mFNDR^ = E{T(A"|')}/E{£(Rf U Rf)} and that for Type HI error as 
mFNDRfii = E{/:(iV3E)}/E{£(Rf U Rf)}. And for simplicity, let mENDEg+ni = mFNDRfj + 
mFNDR^j. 


Table 3: Outcomes of multiple testing with two nulls 




Declared as null 1 

Declared as null 2 

Declared as alternative 

Total 



5^{t) = 1 

S^{t) = 2 

II 

CO 


Null 1 = 

1) 

t^E 

Vii 

Fi 2 (Type III error) 

Fi 3 {Type I error) 

rf 

Null 2 = 

2) 

{Type III error) 

^22 

F 23 {Type I error) 

t^e 

-^2 

Alternative (z® 

(t) = 3) 

{Type II error) 

{Type II error) 

Vi 

t^e 

-^3 

Total 

Rf 

pE 

Jri2 

Rf 

T 


We applied the equivalence testing (9) to the executive function study. The results are 
presented in Figure 8. The non-significant areas (i.e. the mean curves are different) obtained 
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by using = 140 is similar to the ones using test (2) with A = 20 (see Figure 6). One 
reason might be the mFDR^ (analogous to Type I error) controlled here is actually the mFNDR 
(analogous to Type II error) in the multiple testing (2), and the mFNDR^^jjj (analogous to the 
sum of Type II and III errors) minimized in the equivalence testing (9) is actually the mFDRi+in 
(analogous to the sum of Type I and III errors) in test (2), which shows the clear differences 
between these two different types of test. As expected when A^ < 100, there is no rejection. 


Delta=l30 


Della=140 




Age (years) 


Figure 8: The equivalent and non-equivalent areas detected by the proposed procedure 
under niFDR^ control at 0.10. Left (A^ = 130): aging from 9 to 10 and after 10.5, the 
BLC mean correct latency of NAVGPs and AVGPs are similar (h^(t) = 3), while the 
NAVGPs have higher BLG mean correct latency than AVGPs at other ages (5^(t) = 2). 
The estimates of errors are: niFDR^ = 0.10, mFNDRj^ = 0.25, and niFNDR^j = 0.001; 
Right (A^ = 140): aging after 8.8, the BLG mean correct latency of NAVGPs and 
AVGPs are similar (h^(t) = 3), while the NAVGPs have higher BLG mean correct latency 
than AVGPs from 6 to 8.8 = 2). The estimates of errors are: mFDR^ = 0.10, 

mFNDR^ = 0.27, and mFNDR^j = 0. 

6 Discussion 

In this paper we proposed a method based on large scale multiple testing to detect differences 
of the means of two curves. It can automatically detect the significant areas and at the same 
time control the directional error. By taking advantage of the functional nature of the data, 
we introduce a nonparametric Gaussian process regression model for simultaneous two-sided 
tests. We are thus able to make inference at any point in a continuum and derive a procedure 
which optimally controls directional false discovery rates. To make it workable in practice, 
an approximation procedure is proposed via a finite approximation strategy. We show that the 
proposed procedure controls directional false discovery rates at any specified level asymptotically. 
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Related to the topic discussed in this paper, some interesting problems are worth further 
development. Though simulation studies validate the good control ability of the proposed proce¬ 
dure over both Type I and directional errors, the estimation of the unknown model parameters 
may affect the power of the testing method. It is therefore important for us to discuss the 
asymptotic optimality of the data-driven procedure with estimated model parameters in a more 
systematic fashion. And this paper focuses mainly on the problem defined in one-dimensional 
domain. It will be interesting to extend the idea to more complicated case, such as the problem 
defined in two- or three-dimensional spatial domain, or in temporal-spatio domain. Gaussian 
process regression model can cope with problems with multidimensional covariates. This good 
feature makes such extension feasible. On the other side of the spectrum, Fogarty and Small 
(2014) considered an equivalence testing for overall mean difference with dynamic bands. To ex¬ 
tend the equivalence testing (9) to a more general case with dynamic lower and upper equivalence 
bands is another interesting direction for future investigation. 
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Appendix 


.fc=i ■ 


Appendix A. Technical Proofs 

Proof of Theorem 1. We first prove Theorem 1(1). If A 2 = 1 and Ai = A 3 = A, the loss 
function (3) becomes 

L{6, z; A) = C{N 2 ) + A{£(A^i) + ^(iVs)}, 

which can be re-written as 

2 r 2 

L{6, z;X) = ^ y I{zit) = k)I{5it) = 0)dC{t) + A /(z(t) = 0)I{S{t) = k)dC{t) 

+ '^Y1 [ = k)I{5{t) = j)dC{t) 

j=i k^o,j 

Then, the posterior classification risk is 

2 f 2 

E{L(,5, z;X)\V} = V / I{5{t) = 0)P(z(t) = k \ V)dC{t) + A J V / I{5{t) = k) 

k=iJT [k=iJT 

F{z{t) = 0 I V)dC{t) + EE [ I{5{t) = j)Fiz{t) = k I V)dC{t) 

j=l kjtOJ ' 

= j I = 0)P(2(*) # 0 I P) + A ^ /(«(() = (:)P( 2 (*) # t I P) I (i£(t) 

= j(p( 2 (») # 01 p) I /(«(() = o) + '£i{nt) = 

Therefore, the optimal decision rule : t G T} = argmin 5 E{L((i, z; A)|P} is 

^(/+m)(^) = if ^ ^ j > A and F{z{t) = k\ V) = maxP(z(t) = j \ V), 

= 0 otherwise, 


which finishes the proof of Theorem 1(1). Similar arguments can be used to prove Theorem 1(2). 

□ 

Proof of Theorem 2. Given an inFDRi+ni level a, consider a decision rule <5 = {S{t) : 
t G T} with mEDRi+inld} < a. Let R be the expected rejection area for <5. Define T(t) = 
{mini<fc <2 P(^;(t) = k \ 'D) + F{z{t) = 0 | 'D)}/F{z{t) / 0 | P). Then, according to the definition 
of its corresponding expected rejection area is 

R(A) = E [ < X-^)dC{t) = [ F{T{t) <X-^)dC{t). 

Jt Jt 

Hence, R{X) is decreasing with A. In addition, it is easy to see that 

lim ^ I = 1 , and lim R{X) = 0. 

\^ 0 C{T) ’ A^oo ^ ^ 
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Consequently, for a given expected rejection area R determined by 6 , there exists a unique X{R) 
such that the decision rule has the same expected rejection area. 

Further, for define TD5(/+/i/), FD^(i+i/i)j and as the expected true 

discovery area, expected false discovery area related to Type I error and expected false discovery 
area related to Type III error, respectively. Then, we have 

^ r 

/ I{z{t) = = k)dC{t), 

k=i 

^ r 

Ve / I{z{t) = = k)dC{t), 

k=i 

j=l k^0,j dT 

and R{\) = TD5(/+/i/) + Similarly, let TD^, ED^/ and FDsm be 

the expected true discovery area, expected false discovery area related to Type I error and 
expected false discovery area related to Type III error for <5, respectively. Then, it also holds 
that R{X) = TD^ + ED57 + FD^///. For ( = consider the loss function 


TD5(/+/i/) = 

FD5(/+/i/)j = 

FD^(/+J77)^^^ = 


Liz,C) = £(iV2) + A{T(iVi)+£(iV3)} 

2 r 2 


k=l ' 


.fc=l ■ 


+E E / = j)dc{t) 


j=l k^OJ ^ 

Then, the risk for 6 and is 

2 

"e 

It 


FL{z, C) = ^ E / I{z{t) = k){l - I{C{t) + 0 )}d/:{t) + A(FDc/ + FDcm) 

k=i 

2 2 

= [ Y 1 - E / ^ I{z{t) = k)im = k)dC{t) 

J^k=i d^k=i 

r ^ 

-^tEE Iiz{t) = k)I{C{t) = j)dC{t) + A(FDc/ + FD^m) 

j=l k^OJ 

r ^ 

= Y. - (TDc + FD^m). 

T 1 _1 


k=l 


Since EL(2;, < FL{z, 6 ), it implies that FD^(7+///)j+ FD^(7+///)jjj < FD5/ + FD5777 and 

TD5(/+///) + FD^(7+///)jjj > TD5 + FDsill- Therefore, 

mFDRi+i„{5(^+^^^)} = FD,(r+r77), + FD,(7+777),,, ^ FD„ + FD„„ ^ 


R{X) 


R{X) 
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and 


mFNDR(i<'+-)} = < mFNDRIS}. 

□ 

To prove the procedure (5) is asymptotically valid for FDRi control, we first need the 
following regularity conditions. 

Cl Let p > 0 be a small positive constant. For /tq = — A or A, J’^P(|//d(t)—/.tol </o)dT(t) —^ 0 
as p —)• 0. 

C2 Let <t< Si). Assume the sequence of partitions Sj) : 

N = 1, 2,...} satisfies that for any given p > 0, JrpFdfidit) — (t)| > p)dC{t) —>■ 0 as 

N ^ oo 


Conditions Cl and C2 are similar to conditions 1-2 in Sun et al. (2015). Condition Cl states 
that {pd{t) '■ t £ T} is a smooth process that does not degenerate at both points —A and A. 
It is naturally holds when {pdit) ■ t £ T} is a continuous random process, which ensures that 
the inequality between z{t) and (t) only occurs with a small chance when {t) — Pd{t)\ 
is small, where z^{t) = <t< Si). Condition C2 requires that the partition 

T = Si) should produce roughly homogeneous subintervals so that the decision at the 

center point t* can be a good representation of the decision process on subinterval [sj_i,Sj). 
Then, we will need a lemma of Sun et al. (2015) (Lemma 2). We re-state the result. 


Lemma 6.1. Under conditions Cl and C2, lim^v-j-ooP(2(^) 7^ z^{t))dC{t) = 0. 

Proof of Theorem 3. Let Sk{t) = F{z{t) = k \ V), A: = 0,1, 2. According to the dehnition 
of FDRi, the FDRi level of procedure (5) is 


FDRj < E 


= E 


1 


C{Ri U Ra) V 1 Jq 

N 


So{t)I{6^^Ht)^0)dC{t)] 




N 


= E 


N{C{Ri U Ra) V 1) ^ 




where A„ = E[{£(R, U He) V l)-‘ ^.=1 HH'-’Ht'i) ^ 0)/.'■_.(So(fn - S„(r))<!£{t)]. 

Eurther, let (t) = F{z^{t) = k \ T>), k = 0,1, 2. Note that EjR^ (t) — 5a;(A)| = F{z^{t) = 
k, z{t) ^ k) + F{z{t) = k, z^(t) 7 ^ k). Then, an application of Lemma 6.1 yields that 

^ { £(H. U He) V 1 r ^ 

< [' E[/(5W(t) / 0){S^it) - So{t)}]dC{t) 

Jo 

< 2 [ F{z{t) ^ z^{t))dC{t) 0, 
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where the second inequality follows from the fact that {C{Ri U R 2 ) V 1} ^ < 1. Since the 
proposed procedure guarantees that 


1 

N{C{RiVJR2) V 1) 


N 

i=l 


< a 


for all realization of P, the FDRi is controlled at level a asymptotically. 


□ 


Appendix B. Derivation of equations ( 8 ) 

Note that fQ{Y, jl, fid) = /©(^)/ 0 (A) P-d \ R), where /©(h") does not contain any information 
about fi and fid- Hence, we have 

f&{P,Pd\'R) oc fQ{Y,p,,p-d) 

ni 712 

oc (t){fi I 0,Kn)(j){pd I 0,r„i) I /i(tii) + fdd{tii),a‘^)'\\(l){Y 2 i \ /u(t 2 i), cr^)- 

2=1 2=1 

Then, it is straightforward to know that 

f&iPd\Y>) = j fQ{p-,fid\V)dfi 

oc expi^-^{fid-A-^b)^A{fid-A-^b)'^ , 

where A = cr‘^Tfl + Ini — and b = (Ini — Sn)!^ 1 — 5]i2l^2- It implies that fid = A~^& + ei 
with ei ~ A^(0, a'^A~^). On the other hand, note that (Aj) follows a multivariate normal 

distribution with mean zero and covariance matrix T, where 

^ ^ / r„i \ 

T^(r*) Tn ) ■ 

Thus, we have = ^{T*)TflfLd + €2 with 62 ~ A^(0,rjv - Conse¬ 
quently, = 'If(T*)rflA~^b + -I- e 2 , so the conditional distribution of 

given T> is a multivariate normal distribution with mean and covariance matrix 

a^^{T*)TflA~^Tfl^^{T*) + TN - (T*), i.e., \V^ N{fi,A). 



